Threshold Interval Indexing for Complicated Uncertain Data
نویسندگان
چکیده
Uncertain data is an increasingly prevalent topic in database research, given the advance of instruments which inherently generate uncertainty in their data. In particular, the problem of indexing uncertain data for range queries has received considerable attention. To efficiently process range queries, existing approaches mainly focus on reducing the number of disk I/Os. However, due to the inherent complexity of uncertain data, processing a range query may involve high computational cost in addition to the I/O cost. In this paper, we present a novel indexing strategy focusing on one-dimensional uncertain continuous data, called threshold interval indexing. Threshold interval indexing is able to balance between I/O cost and computational cost to achieve an optimal overall query performance. A key ingredient of the proposed indexing structure is a dynamic interval tree. The dynamic interval tree is much more resistant to skew than Rtrees, which are widely used in other indexing structures. We also present a more efficient version of our index, called the memoryloaded threshold interval index, which reduces the storage size so the primary tree can be loaded into memory. We perform experiments to demonstrate the effectiveness and efficiency of the proposed indexing strategy.
منابع مشابه
Threshold interval indexing techniques for complicated uncertain data
Uncertain data is an increasingly prevalent topic in database research, given the advance of instruments which inherently generate uncertainty in their data. In particular, the problem of indexing uncertain data for range queries has received considerable attention. To efficiently process range queries, existing approaches mainly focus on reducing the number of disk I/Os. However, due to the in...
متن کاملEfficient Indexing Methods for Probabilistic Threshold Queries over Uncertain Data
It is infeasible for a sensor database to contain the exact value of each sensor at all points in time. This uncertainty is inherent in these systems due to measurement and sampling errors, and resource limitations. In order to avoid drawing erroneous conclusions based upon stale data, the use of uncertainty intervals that model each data item as a range and associated probability density funct...
متن کاملProbabilistic Threshold Indexing for Uncertain Strings
Strings form a fundamental data type in computer systems. String searching has been extensively studied since the inception of computer science. Increasingly many applications have to deal with imprecise strings or strings with fuzzy information in them. String matching becomes a probabilistic event when a string contains uncertainty, i.e. each position of the string can have different probable...
متن کاملIndexing Uncertain Categorical Data over Distributed Environment
Today, a large amount of uncertain data is produced by several applications where the management systems of traditional databases including indexing methods are not suitable to handle such type of data. In this paper, we propose an inverted based index method for efficiently searching uncertain categorical data over distributed environments. We address two kinds of query over the distributed un...
متن کاملInterval network data envelopment analysis model for classification of investment companies in the presence of uncertain data
The main purpose of this paper is to propose an approach for performance measurement, classification and ranking the investment companies (ICs) by considering internal structure and uncertainty. In order to reach this goal, the interval network data envelopment analysis (INDEA) models are extended. This model is capable to model two-stage efficiency with intermediate measures i...
متن کامل